Skip to content

Improve Int128 ABI mismatch detection.#3170

Merged
maleadt merged 2 commits into
mainfrom
tb/int128_abi
Jun 8, 2026
Merged

Improve Int128 ABI mismatch detection.#3170
maleadt merged 2 commits into
mainfrom
tb/int128_abi

Conversation

@maleadt

@maleadt maleadt commented Jun 8, 2026

Copy link
Copy Markdown
Member

For some context on why we started doing this in the first place: The NVPTX back-end aligns 128-bit integers to 16 bytes. Julia only started doing that in 1.12; on 1.10 and 1.11, Int128 is 8-byte aligned. So on those versions an aggregate with an (U)Int128 field has a different layout on the host than on the device. struct {Int64; Int128} is 24 bytes on the host (field at offset 8) and 32 bytes on the device (offset 16), and codegen always uses the device layout, so the two disagree.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: 34dc577 Previous: 1caa3ad Ratio
array/accumulate/Float32/1d 99383 ns 100770 ns 0.99
array/accumulate/Float32/dims=1 75149 ns 76381 ns 0.98
array/accumulate/Float32/dims=1L 1629231 ns 1630799 ns 1.00
array/accumulate/Float32/dims=2 140654 ns 141553 ns 0.99
array/accumulate/Float32/dims=2L 653360 ns 653429 ns 1.00
array/accumulate/Int64/1d 118282 ns 119282 ns 0.99
array/accumulate/Int64/dims=1 78853 ns 80346 ns 0.98
array/accumulate/Int64/dims=1L 1724276 ns 1726167 ns 1.00
array/accumulate/Int64/dims=2 153119 ns 154865 ns 0.99
array/accumulate/Int64/dims=2L 959403 ns 960762 ns 1.00
array/broadcast 18258 ns 18636 ns 0.98
array/construct 1213.6 ns 1242.5 ns 0.98
array/copy 16518 ns 16748 ns 0.99
array/copyto!/cpu_to_gpu 212993 ns 214495 ns 0.99
array/copyto!/gpu_to_cpu 280084 ns 281315 ns 1.00
array/copyto!/gpu_to_gpu 10334 ns 10356 ns 1.00
array/iteration/findall/bool 132639 ns 134688 ns 0.98
array/iteration/findall/int 146104 ns 148663 ns 0.98
array/iteration/findfirst/bool 68874 ns 70356 ns 0.98
array/iteration/findfirst/int 70427 ns 71778 ns 0.98
array/iteration/findmin/1d 65071 ns 68513 ns 0.95
array/iteration/findmin/2d 101226 ns 101687 ns 1.00
array/iteration/logical 189699 ns 195448 ns 0.97
array/iteration/scalar 64535 ns 66072 ns 0.98
array/permutedims/2d 49387 ns 50059 ns 0.99
array/permutedims/3d 50674 ns 50890 ns 1.00
array/permutedims/4d 50651 ns 51304 ns 0.99
array/random/rand/Float32 11381 ns 11937 ns 0.95
array/random/rand/Int64 22105 ns 24454 ns 0.90
array/random/rand!/Float32 7847 ns 8092.666666666667 ns 0.97
array/random/rand!/Int64 17412 ns 20949 ns 0.83
array/random/randn/Float32 34683 ns 36329 ns 0.95
array/random/randn!/Float32 23723 ns 24624 ns 0.96
array/reductions/mapreduce/Float32/1d 33101 ns 33603 ns 0.99
array/reductions/mapreduce/Float32/dims=1 38329 ns 38517 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 50058 ns 50114 ns 1.00
array/reductions/mapreduce/Float32/dims=2 55785 ns 56468 ns 0.99
array/reductions/mapreduce/Float32/dims=2L 66656 ns 67374 ns 0.99
array/reductions/mapreduce/Int64/1d 38514 ns 39976 ns 0.96
array/reductions/mapreduce/Int64/dims=1 41371 ns 41660 ns 0.99
array/reductions/mapreduce/Int64/dims=1L 86330 ns 86590 ns 1.00
array/reductions/mapreduce/Int64/dims=2 58183 ns 59096 ns 0.98
array/reductions/mapreduce/Int64/dims=2L 81985 ns 82999 ns 0.99
array/reductions/reduce/Float32/1d 33246 ns 33460 ns 0.99
array/reductions/reduce/Float32/dims=1 38133 ns 38736 ns 0.98
array/reductions/reduce/Float32/dims=1L 50073 ns 50547 ns 0.99
array/reductions/reduce/Float32/dims=2 55632 ns 56305 ns 0.99
array/reductions/reduce/Float32/dims=2L 66888 ns 67363 ns 0.99
array/reductions/reduce/Int64/1d 39180 ns 40096 ns 0.98
array/reductions/reduce/Int64/dims=1 40997 ns 41717 ns 0.98
array/reductions/reduce/Int64/dims=1L 86233 ns 86750 ns 0.99
array/reductions/reduce/Int64/dims=2 57731 ns 58479 ns 0.99
array/reductions/reduce/Int64/dims=2L 82615 ns 83983 ns 0.98
array/reverse/1d 16748 ns 17202 ns 0.97
array/reverse/1dL 67810 ns 68074 ns 1.00
array/reverse/1dL_inplace 65177 ns 65359 ns 1.00
array/reverse/1d_inplace 8217.666666666666 ns 8376.666666666666 ns 0.98
array/reverse/2d 19882 ns 20409 ns 0.97
array/reverse/2dL 71655 ns 72447 ns 0.99
array/reverse/2dL_inplace 65002 ns 64948 ns 1.00
array/reverse/2d_inplace 9597 ns 9770 ns 0.98
array/sorting/1d 2655304 ns 2653489 ns 1.00
array/sorting/2d 1032380 ns 1034091 ns 1.00
array/sorting/by 3180105 ns 3181225 ns 1.00
cuda/synchronization/context/auto 1128.8 ns 1135.5 ns 0.99
cuda/synchronization/context/blocking 894.1276595744681 ns 939.7407407407408 ns 0.95
cuda/synchronization/context/nonblocking 6043 ns 6181.8 ns 0.98
cuda/synchronization/stream/auto 994.8181818181819 ns 991.6 ns 1.00
cuda/synchronization/stream/blocking 799.989247311828 ns 841.3783783783783 ns 0.95
cuda/synchronization/stream/nonblocking 6045 ns 6139.8 ns 0.98
integration/byval/reference 143105 ns 143264 ns 1.00
integration/byval/slices=1 145192 ns 145361 ns 1.00
integration/byval/slices=2 283765 ns 283971 ns 1.00
integration/byval/slices=3 422310 ns 422479 ns 1.00
integration/cudadevrt 101577 ns 101731 ns 1.00
integration/volumerhs 8988630 ns 8999650 ns 1.00
kernel/indexing 12590 ns 12703 ns 0.99
kernel/indexing_checked 13321 ns 13354 ns 1.00
kernel/launch 2089.3333333333335 ns 2086 ns 1.00
kernel/occupancy 734.3741007194244 ns 693.3698630136986 ns 1.06
kernel/rand 15863 ns 15890 ns 1.00
latency/import 3864457489 ns 3853824190 ns 1.00
latency/precompile 4639386595 ns 4635422433 ns 1.00
latency/ttfp 4559920169 ns 4550893912 ns 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 16.31%. Comparing base (1caa3ad) to head (34dc577).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3170      +/-   ##
==========================================
+ Coverage   10.82%   16.31%   +5.48%     
==========================================
  Files         123      124       +1     
  Lines        9479     9875     +396     
==========================================
+ Hits         1026     1611     +585     
+ Misses       8453     8264     -189     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maleadt maleadt merged commit 642cf8d into main Jun 8, 2026
2 checks passed
@maleadt maleadt deleted the tb/int128_abi branch June 8, 2026 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant